Covariance Regularization by Thresholding
نویسنده
چکیده
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n→ 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data.
منابع مشابه
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly ...
متن کاملComputation-Risk Tradeoffs for Covariance-Thresholded Regression
We present a family of linear regression estimators that provides a fine-grained tradeoff between statistical accuracy and computational efficiency. The estimators are based on hard thresholding of the sample covariance matrix entries together with `2-regularizion (ridge regression). We analyze the predictive risk of this family of estimators as a function of the threshold and regularization pa...
متن کاملCovariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in ...
متن کاملDiscovering Sparse Covariance Structures with the Isomap
Regularization of covariance matrices in high dimensions is usually either based on a known ordering of variables or ignores the ordering entirely. This paper proposes a method for discovering meaningful orderings of variables based on their correlations using the Isomap, a non-linear dimension reduction technique designed for manifold embeddings. These orderings are then used to construct a sp...
متن کاملPosterior convergence rates for estimating large precision matrices using graphical models
We consider Bayesian estimation of a p×p precision matrix, where p can be much larger than the available sample size n. It is well known that consistent estimation in such an ultra-high dimensional situation requires regularization such as banding, tapering or thresholding. We consider a banding structure in the model and induce a prior distribution on a banded precision matrix through a Gaussi...
متن کامل